1 Introduction

Iterative Linear Association Analysis (ILAA) is a computational method that creates a linear transformation from a sample of multidimensional data that effectively removes linear associations between data variables. The returned transformation matrix can be used to:

  1. Do an exploratory analysis of latent variables and their association to all the observed variables

  2. Do exploratory discovery of latent variables associated with an specific outcome-target

  3. Addressing multicollinearity issues in linear regression models

    1. Better estimation and interpretation of model variables

    2. Improve linear model performance

  4. Simplify the multidimensional search space for many ML algorithms

The objective of this tutorial is to guide users in using the ILAA to effectively accomplish the aforementioned tasks. The tutorial will showcase:

1.1 The Libraries

ILAA is a wrapper of the more general method of data decorrelation algorithm (IDeA) implemented in R, and both are part of the FRESA.CAD 3.4.6 package.

## From git hub
#install_github("joseTamezPena/FRESA.CAD")

## For ILAA
library("FRESA.CAD")

## For network analysis
library(igraph)

1.2 Material and Methods

For this tutorial I’ll use the body-fat prediction data set. The data was downloaded from Kaggle:

https://www.kaggle.com/datasets/fedesoriano/body-fat-prediction-dataset

The Kaggle data disclaimer:

“Source The data were generously supplied by Dr. A. Garth Fisher who gave permission to freely distribute the data and use for non-commercial purposes.

Roger W. Johnson Department of Mathematics & Computer Science South Dakota School of Mines & Technology 501 East St. Joseph Street Rapid City, SD 57701

email address: web address: http://silver.sdsmt.edu/~rwjohnso

1.3 Loading the Data

The following code snippet loads the data and removes the density information from the data. It also computes the Body Mass Index (BMI)

body_fat <- read.csv("~/GitHub/LatentBiomarkers/Data/BodyFat/BodyFat.csv", header=TRUE)

### Removing density as estimator
body_fat$Density <- NULL

body_fat$BMI <- 10000*body_fat$Weight*0.453592/((body_fat$Height*2.54)^2)
## Removing subjects with data errors
body_fat <- body_fat[body_fat$BMI<=50,]

1.4 ILAA Unsupervised Processing

The ILAA function is:

 ILAA(data=NULL,
                thr=0.80,
                method=c("pearson","spearman"),
                Outcome=NULL,
                drivingFeatures=NULL,
                maxLoops=100,
                verbose=FALSE
      )

where:

  • data: The source data-frame

  • thr : The target correlation goal.

  • method : Defines the correlation measure

  • Outcome The name of the target variable, and it is required for supervised learning

  • drivingFeatures : Defines a set of variables that are aimed to be basis unaltered vectors

  • maxLoops : The maximum number of iterations cycles

  • verbose : Display the evolution of the algorithm.

By default, the ILAA function will target a correlation lower than 0.8 using the Pearson correlation measure. But user has the freedom to chose between robust fitting with Spearman correlation measure, and/or set the level of feature association by lowering the threshold. The following snippet shows the different options.


# Default call
body_fat_Decorrelated <- ILAA(body_fat)
pander::pander(colnames(attr(body_fat_Decorrelated,"UPLTM")))

Weight, La_Neck, La_Chest, La_Abdomen, La_Hip, La_Thigh, La_Knee, La_Biceps and La_BMI


# Explore the convergence metrics in verbose mode
body_fat_Decorrelated <- ILAA(body_fat,verbose=TRUE)

fast | LM | Included: 15 , Uni p: 0.01 , Outcome-Driven Size: 0 , Base Size: 6 , Rcrit: 0.1467743

1 <R=0.944,thr=0.900>, Top: 2( 1 )1 : 2 Fa= 2 : 0.900,<|>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888

2 <R=0.888,thr=0.800>, Top: 1( 5 )1 : 1 Fa= 2 : 0.800,<|>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.735

3 <R=0.735,thr=0.800>

[ 3 ], 0.4485365 Decor Dimension: 9 Nused: 9 . Cor to Base: 6 , ABase: 1 , Outcome Base: 0

pander::pander(colnames(attr(body_fat_Decorrelated,"UPLTM")))

Weight, La_Neck, La_Chest, La_Abdomen, La_Hip, La_Thigh, La_Knee, La_Biceps and La_BMI


# Robust Linear Fitting with spearman correlation measure
body_fat_Decorrelated <- ILAA(body_fat,method="spearman",verbose=TRUE)

spearman | RLM | Included: 15 , Uni p: 0.01 , Outcome-Driven Size: 0 , Base Size: 7 , Rcrit: 0.1467743

1 <R=0.929,thr=0.900>, Top: 2( 1 )1 : 2 Fa= 2 : 0.900,<>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.872

2 <R=0.872,thr=0.800>, Top: 1( 4 )1 : 1 Fa= 2 : 0.800,<>Tot Used: 8 , Added: 4 , Zero Std: 0 , Max Cor: 0.837

3 <R=0.837,thr=0.800>, Top: 1( 1 )1 : 1 Fa= 3 : 0.800,<>Tot Used: 9 , Added: 1 , Zero Std: 0 , Max Cor: 0.812

4 <R=0.812,thr=0.800>, Top: 1( 1 )1 : 1 Fa= 3 : 0.800,<>Tot Used: 9 , Added: 1 , Zero Std: 0 , Max Cor: 0.781

5 <R=0.781,thr=0.800>

[ 5 ], 0.7666132 Decor Dimension: 9 Nused: 9 . Cor to Base: 5 , ABase: 2 , Outcome Base: 0

pander::pander(colnames(attr(body_fat_Decorrelated,"UPLTM")))

Weight, Height, La_Neck, La_Chest, La_Abdomen, La_Hip, La_Thigh, La_Knee and La_BMI


# Lowering the threshold
body_fat_Decorrelated <- ILAA(body_fat,thr=0.4,verbose=TRUE)

fast | LM | Included: 15 , Uni p: 0.01 , Outcome-Driven Size: 0 , Base Size: 2 , Rcrit: 0.1467743

1 <R=0.944,thr=0.900>, Top: 2( 1 )1 : 2 Fa= 2 : 0.900,<|>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888

2 <R=0.888,thr=0.750>, Top: 1( 5 )1 : 1 Fa= 2 : 0.750,<|>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.735

3 <R=0.735,thr=0.600>, Top: 1( 4 )1 : 1 Fa= 2 : 0.600,<|>Tot Used: 13 , Added: 4 , Zero Std: 0 , Max Cor: 0.742

4 <R=0.742,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 2 : 0.600,<|>Tot Used: 13 , Added: 1 , Zero Std: 0 , Max Cor: 0.526

5 <R=0.526,thr=0.450>, Top: 2( 1 )1 : 2 Fa= 3 : 0.450,<|>Tot Used: 15 , Added: 2 , Zero Std: 0 , Max Cor: 0.540

6 <R=0.540,thr=0.450>, Top: 1( 2 )1 : 1 Fa= 4 : 0.450,<|>Tot Used: 15 , Added: 2 , Zero Std: 0 , Max Cor: 0.727

7 <R=0.727,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 4 : 0.600,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.849

8 <R=0.849,thr=0.750>, Top: 1( 1 )1 : 1 Fa= 4 : 0.750,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.577

9 <R=0.577,thr=0.450>, Top: 2( 1 )1 : 2 Fa= 4 : 0.450,<|>Tot Used: 15 , Added: 2 , Zero Std: 0 , Max Cor: 0.540

10 <R=0.540,thr=0.450>, Top: 1( 1 )1 : 1 Fa= 4 : 0.450,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.449

11 <R=0.449,thr=0.400>, Top: 2( 1 )1 : 2 Fa= 5 : 0.400,<|>Tot Used: 15 , Added: 2 , Zero Std: 0 , Max Cor: 0.375

12 <R=0.375,thr=0.400>

[ 12 ], 0.374645 Decor Dimension: 15 Nused: 15 . Cor to Base: 13 , ABase: 2 , Outcome Base: 0

pander::pander(colnames(attr(body_fat_Decorrelated,"UPLTM")))

La_BodyFat, Age, Weight, La_Height, La_Neck, La_Chest, La_Abdomen, La_Hip, La_Thigh, La_Knee, La_Ankle, La_Biceps, La_Forearm, La_Wrist and La_BMI


# Tring to achive the maximum independence beteeen variables, i.e., thr=0.0
body_fat_Decorrelated <- ILAA(body_fat,thr=0.0,verbose=TRUE)

fast | LM | Included: 15 , Uni p: 0.01 , Outcome-Driven Size: 0 , Base Size: 1 , Rcrit: 0.1467743

1 <R=0.944,thr=0.900>, Top: 2( 1 )1 : 2 Fa= 2 : 0.900,<|>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888

2 <R=0.888,thr=0.750>, Top: 1( 5 )1 : 1 Fa= 2 : 0.750,<|>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.735

3 <R=0.735,thr=0.600>, Top: 1( 4 )1 : 1 Fa= 2 : 0.600,<|>Tot Used: 13 , Added: 4 , Zero Std: 0 , Max Cor: 0.742

4 <R=0.742,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 2 : 0.600,<|>Tot Used: 13 , Added: 1 , Zero Std: 0 , Max Cor: 0.526

5 <R=0.526,thr=0.450>, Top: 2( 2 )1 : 2 Fa= 2 : 0.450,<|>Tot Used: 15 , Added: 2 , Zero Std: 0 , Max Cor: 0.633

6 <R=0.633,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 3 : 0.600,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.703

7 <R=0.703,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 3 : 0.600,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.633

8 <R=0.633,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 3 : 0.600,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.734

9 <R=0.734,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 3 : 0.600,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.621

10 <R=0.621,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 3 : 0.600,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.575

11 <R=0.575,thr=0.450>, Top: 1( 1 )1 : 1 Fa= 3 : 0.450,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.556

12 <R=0.556,thr=0.450>, Top: 1( 1 )1 : 1 Fa= 3 : 0.450,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.449

13 <R=0.449,thr=0.200>, Top: 2( 7 )1 : 2 Fa= 4 : 0.200,<|>Tot Used: 15 , Added: 7 , Zero Std: 0 , Max Cor: 0.404

14 <R=0.404,thr=0.200>, Top: 2( 4 )1 : 2 Fa= 5 : 0.200,<|>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.413

15 <R=0.413,thr=0.200>, Top: 1( 5 )1 : 1 Fa= 6 : 0.200,<|>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.305

16 <R=0.305,thr=0.200>, Top: 3( 1 )1 : 3 Fa= 7 : 0.200,<|>Tot Used: 15 , Added: 3 , Zero Std: 0 , Max Cor: 0.197

17 <R=0.197,thr=0.147>, Top: 2( 3 )1 : 2 Fa= 9 : 0.147,<|>Tot Used: 15 , Added: 6 , Zero Std: 0 , Max Cor: 0.213

18 <R=0.213,thr=0.200>, Top: 1( 1 )1 : 1 Fa= 9 : 0.200,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.165

19 <R=0.165,thr=0.147>, Top: 3( 2 )1 : 3 Fa= 11 : 0.147,<|>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.169

20 <R=0.169,thr=0.147>, Top: 2( 2 )1 : 2 Fa= 12 : 0.147,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.162

21 <R=0.162,thr=0.147>, Top: 2( 1 )1 : 2 Fa= 12 : 0.147,<>Tot Used: 15 , Added: 0 , Zero Std: 0 , Max Cor: 0.162

[ 21 ], 0.1620427 Decor Dimension: 15 Nused: 15 . Cor to Base: 14 , ABase: 1 , Outcome Base: 0

pander::pander(colnames(attr(body_fat_Decorrelated,"UPLTM")))

La_BodyFat, La_Age, Weight, La_Height, La_Neck, La_Chest, La_Abdomen, La_Hip, La_Thigh, La_Knee, La_Ankle, La_Biceps, La_Forearm, La_Wrist and La_BMI

I’ll set the correlation goal to 0.2 in verbose mode. Then I’ll continue the tutorial using this output.


# Calling ILAA to achieve a final correlation of 0.2
body_fat_Decorrelated <- ILAA(body_fat,thr=0.2,verbose=TRUE)

fast | LM | Included: 15 , Uni p: 0.01 , Outcome-Driven Size: 0 , Base Size: 1 , Rcrit: 0.1467743

1 <R=0.944,thr=0.900>, Top: 2( 1 )1 : 2 Fa= 2 : 0.900,<|>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888

2 <R=0.888,thr=0.750>, Top: 1( 5 )1 : 1 Fa= 2 : 0.750,<|>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.735

3 <R=0.735,thr=0.600>, Top: 1( 4 )1 : 1 Fa= 2 : 0.600,<|>Tot Used: 13 , Added: 4 , Zero Std: 0 , Max Cor: 0.742

4 <R=0.742,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 2 : 0.600,<|>Tot Used: 13 , Added: 1 , Zero Std: 0 , Max Cor: 0.526

5 <R=0.526,thr=0.450>, Top: 2( 2 )1 : 2 Fa= 2 : 0.450,<|>Tot Used: 15 , Added: 2 , Zero Std: 0 , Max Cor: 0.633

6 <R=0.633,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 3 : 0.600,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.703

7 <R=0.703,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 3 : 0.600,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.633

8 <R=0.633,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 3 : 0.600,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.734

9 <R=0.734,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 3 : 0.600,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.621

10 <R=0.621,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 3 : 0.600,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.575

11 <R=0.575,thr=0.450>, Top: 1( 1 )1 : 1 Fa= 3 : 0.450,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.556

12 <R=0.556,thr=0.450>, Top: 1( 1 )1 : 1 Fa= 3 : 0.450,<|>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.449

13 <R=0.449,thr=0.200>, Top: 2( 7 )1 : 2 Fa= 4 : 0.200,<|>Tot Used: 15 , Added: 7 , Zero Std: 0 , Max Cor: 0.404

14 <R=0.404,thr=0.200>, Top: 2( 4 )1 : 2 Fa= 5 : 0.200,<|>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.413

15 <R=0.413,thr=0.200>, Top: 1( 5 )1 : 1 Fa= 6 : 0.200,<|>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.305

16 <R=0.305,thr=0.200>, Top: 3( 1 )1 : 3 Fa= 7 : 0.200,<|>Tot Used: 15 , Added: 3 , Zero Std: 0 , Max Cor: 0.197

17 <R=0.197,thr=0.200>

[ 17 ], 0.1973902 Decor Dimension: 15 Nused: 15 . Cor to Base: 14 , ABase: 1 , Outcome Base: 0

1.4.1 Data Frame Attributes

The returned data matrix contains the following attributes

  attr(body_fat_Decorrelated,"UPLTM")            #The transformation matrix
  attr(body_fat_Decorrelated,"fscore")           #The score of each feature
  attr(body_fat_Decorrelated,"drivingFeatures")  #The list of driving features
  attr(body_fat_Decorrelated,"unaltered")        #The list of unaltered features
  attr(body_fat_Decorrelated,"LatentVariables")  #The list of latent variables
  attr(body_fat_Decorrelated,"R.critical")       #The estimated minium correlation
  attr(body_fat_Decorrelated,"IDeAEvolution")    #Evolution of the algorithm

The main attributes is “UPLTM”. That stores the specific linear transformation matrix from observed variables to the latent variable. The “IDeAEvolution” attribute can be used to verify if the algorithm achieved the target correlation goal, and the sparsity of the returned matrix.

1.4.2 Plotting the Evolution

Here we will use the attr(dataTransformed,"IDeAEvolution") to plot the evolution of the correlation measure and the evolution of the matrix sparsity.

par(mfrow=c(1,2),cex=0.5)

# Correlation
yval <- attr(body_fat_Decorrelated,"IDeAEvolution")$Corr
xidx <- c(1:length(yval))
plot(xidx,yval,
     xlab="Iteration Cycle",
     ylab="Max. Pearson Correlation",
     ylim=c(0,1.0),
     main="Evolution of the maximum Correlation")
  lfit <-try(loess(yval~xidx,span=0.5));
  if (!inherits(lfit,"try-error"))
  {
    plx <- try(predict(lfit,se=TRUE))
    if (!inherits(plx,"try-error"))
    {
      lines(xidx,plx$fit,lty=1,col="red")
    }
  }

# Sparsity  
yval <- attr(body_fat_Decorrelated,"IDeAEvolution")$Spar

plot(xidx,yval,
     xlab="Iteration Cycle",
     ylab="Matrix Sparcity",
     ylim=c(0,1.0),
     main="Evolution of the Matrix Sparcity")
  lfit <-try(loess(yval~xidx,span=0.5));
  if (!inherits(lfit,"try-error"))
  {
    plx <- try(predict(lfit,se=TRUE))
    if (!inherits(plx,"try-error"))
    {
      lines(xidx,plx$fit,lty=1,col="red")
    }
  }

1.4.3 The ILAA Transformed Data

Before exploring into more detail, the properties of the ILAA results. Let us first verify that the returned matrix does not contain features with very high correlation among them.

Here I’ll plot the original correlation and the correlation of the returned data set.


# The original
  par(cex=0.6,cex.main=0.85,cex.axis=0.7)
  cormat <- cor(body_fat,method="pearson")
  gplots::heatmap.2(abs(cormat),
                    trace = "none",
                    mar = c(5,5),
                    col=rev(heat.colors(5)),
                    main = "Original Correlation",
                    cexRow = 0.75,
                    cexCol = 0.75,
                     srtCol=30,
                     srtRow=60,
                    key.title=NA,
                    key.xlab="|Pearson Correlation|",
                    xlab="Feature", ylab="Feature")


# The transformed
  cormat <- cor(body_fat_Decorrelated,method="pearson")
  gplots::heatmap.2(abs(cormat),
                    trace = "none",
                    mar = c(5,5),
                    col=rev(heat.colors(5)),
                    main = "Correlation After ILAA",
                    cexRow = 0.75,
                    cexCol = 0.75,
                     srtCol=30,
                     srtRow=60,
                    key.title=NA,
                    key.xlab="|Pearson Correlation|",
                    xlab="Feature", ylab="Feature")

1.4.4 Exploring the Transformation

The attr(body_fat_Decorrelated,"UPLTM") returns the transformation matrix. The UPLTM is sparse, here I show a heat map of the transformation matrix that shows which elements are different from zero.


  UPLTM <- attr(body_fat_Decorrelated,"UPLTM")
  
  gplots::heatmap.2(1.0*(abs(UPLTM)>0),
                    trace = "none",
                    mar = c(5,5),
                    col=rev(heat.colors(5)),
                    main = "Transformation matrix",
                    cexRow = 0.75,
                    cexCol = 0.75,
                   srtCol=30,
                   srtRow=60,
                    key.title=NA,
                    key.xlab="|Beta|>0",
                    xlab="Output Feature", ylab="Input Feature")

1.4.5 The Latent Formulas

The sparsity of the UPLTM matrix can be analyzed to get the formula for each one of the latent formulas. The getLatentCoefficients() and its attribute: attr(LatentFormulas,"LatentCharFormulas") can be used to display the formula of the latent variables.

# Get a list with the latent formulas' coefficients
LatentFormulas <- getLatentCoefficients(body_fat_Decorrelated)

# A string character with the formulas can be obtained by:
charFormulas <- attr(LatentFormulas,"LatentCharFormulas")
pander::pander(as.matrix(charFormulas))
La_BodyFat + BodyFat + (0.148)Weight - (0.986)Abdomen
La_Age + Age + (0.346)Weight - (1.339)Abdomen + (1.520)Thigh - (6.199)Wrist
La_Height - (0.173)Weight + Height + (0.177)Chest + (0.150)Abdomen + (0.231)Thigh
La_Neck - (0.046)Weight + Neck - (0.037)Thigh - (0.743)Wrist
La_Chest - (0.065)Weight + Chest - (0.717)Abdomen + (0.270)Thigh
La_Abdomen - (0.366)Weight + (0.631)Chest + (0.547)Abdomen + (0.171)Thigh
La_Hip - (0.177)Weight + (0.189)Chest - (0.136)Abdomen + Hip - (0.360)Thigh
La_Thigh - (0.154)Weight + Thigh
La_Knee - (0.024)Weight - (0.171)Height - (0.030)Chest - (0.026)Abdomen - (0.142)Thigh + Knee
La_Ankle - (0.045)Weight + (0.044)Chest + (0.038)Abdomen - (0.013)Thigh + Ankle - (0.493)Wrist
La_Biceps - (0.050)Weight - (0.114)Chest + (0.082)Abdomen - (0.194)Thigh + Biceps
La_Forearm - (3.88e-03)Weight - (0.077)Chest + (0.055)Abdomen - (9.62e-03)Thigh - (0.261)Biceps + Forearm - (0.633)Wrist
La_Wrist - (0.031)Weight + (0.050)Thigh + Wrist
La_BMI - (0.135)Weight + (0.701)Height - (0.019)Abdomen + BMI

1.4.6 The Formula Network

The graph_from_adjacency_matrix() function from igraph can be used to visualize the association between variables.

par(op)

transform <- log(abs(attr(body_fat_Decorrelated,"UPLTM"))+1.0)
colnames(transform) <- str_remove_all(colnames(transform),"La_")


VertexSize <- apply(transform,2,mean)
VertexSize <- 5*VertexSize/max(VertexSize)


gr <- graph_from_adjacency_matrix(transform,mode = "directed",diag = FALSE,weighted=TRUE)
gr$layout <- layout_with_fr

fc <- cluster_optimal(gr)
plot(fc, gr,
     edge.width=3*E(gr)$weight,
     edge.arrow.size=0.65,
     edge.arrow.width=0.65,
     vertex.size=VertexSize,
     vertex.label.cex=0.75,
     vertex.label.dist=1,
     main="Feature Association")

par(op)

1.4.7 Latent Variable Interpretation

The ILAA returns the Unit Preserving Linear Transformation Matrix (UPLTM). This specific transformation is the combination of statistically significant linear association analysis between feature pairs. Each significant association is modeled by a linear equation; henceforth, the interpretation of each feature is as follows:

  • Each discovered latent variable is the residual of the observed parent variable vs. the suitable model of the variables associated with the parent variable. For example: \[ Lawrist = Wrist + 0.017BodyFat - 0.026Weight. \]

    Describes that the \(Wrist\) is associated with the \(BodyFat\) and the \(Weight\), and the latent variable \(Lawrist\) is the amount of information in the \(Wrist\) not found by \(BodyFat\) nor the \(Weight\).

  • The model of the \(Wrist\) is therefore:

\[ Wrist = -0.017BodyFat + 0.026Weight. \]

The following code shows the association of the latent variable to each one of the observed parent variable, and the association of the parent variables to its linear model.


par(mfrow=c(1,2),cex=0.35)
fnames <- names(charFormulas)[1]
for (fnames in names(charFormulas))
{
  obsname <- str_remove(fnames,"La_")
  menv <- mean(body_fat_Decorrelated[,fnames])
  range <- max(body_fat[,obsname])-min(body_fat[,obsname])
  ylim <- c(menv-range/2,menv+range/2)
  plot(body_fat[,obsname],
       body_fat_Decorrelated[,fnames],
       ylim=ylim,
       ylab=fnames,
       xlab=obsname,
       main=paste("ILAA Latent Variable:",fnames))
  
  deformula <- LatentFormulas[[fnames]]
  noInames <- names(deformula)[names(deformula) != obsname]
  predObs <- -(as.matrix(body_fat[,noInames]) %*% deformula[noInames])
  plot(predObs,
       body_fat[,obsname],
       ylab=obsname,
       xlab=charFormulas[fnames],
       main=paste("ILAA Generated Predictions of",obsname),
       cex.labels=0.5)
}


par(op)

The visual inspection of the above-displayed figures shows that some latent variables are not associated with the original parent variable, but their model is fully correlated to the observed parent variable. A clear example is BMI (The last plot in the above figure).

1.5 ILAA for Supervised Learning

The rerecorded use of ILAA transformation in supervised learning is to split the data into training and validation sets. Henceforth, the next lines of code will split the data into training (75%) and testing (25%)

1.5.1 Split into Training Testing Sets


# 75% for training 25% for testing 
set.seed(2)
trainsamples <- sample(nrow(body_fat),3*nrow(body_fat)/4)

trainingset <- body_fat[trainsamples,]
testingset <- body_fat[-trainsamples,]

1.6 Data Train Analysis and Prediction of the Test Set

By default, ILAA() transforms are blind to outcome associations. but in supervised learning the user is free to specify a target outcome to drive the shape of the transformation matrix. Outcome-driven transformations try to keep unaltered features strongly associated with the target.

The predictDecorrelate() function can be used to predict any new dataset from an ILAA transformed object.

The next code snippet shows the process of transforming the training set and then using the returned object to transform the testing set using both outcome-blind and outcome-driven transformations.


## Outcome-blind
body_fat_Decorrelated_train <- ILAA(trainingset,
                                    thr=0.2,
                                    Outcome="BodyFat",
                                    verbose=TRUE)

fast | LM | Included: 14 , Uni p: 0.01071429 , Outcome-Driven Size: 0 , Base Size: 1 , Rcrit: 0.1676986

1 <R=0.940,thr=0.900>, Top: 2( 1 )1 : 2 Fa= 2 : 0.900,<|>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.880

2 <R=0.880,thr=0.750>, Top: 1( 5 )1 : 1 Fa= 2 : 0.750,<|>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.852

3 <R=0.852,thr=0.750>, Top: 1( 1 )1 : 1 Fa= 2 : 0.750,<|>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.969

4 <R=0.969,thr=0.950>, Top: 1( 1 )1 : 1 Fa= 2 : 0.950,<|>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.713

5 <R=0.713,thr=0.600>, Top: 1( 3 )1 : 1 Fa= 2 : 0.600,<|>Tot Used: 13 , Added: 3 , Zero Std: 0 , Max Cor: 0.462

6 <R=0.462,thr=0.450>, Top: 1( 1 )1 : 1 Fa= 3 : 0.450,<|>Tot Used: 13 , Added: 1 , Zero Std: 0 , Max Cor: 0.419

7 <R=0.419,thr=0.200>, Top: 2( 5 )1 : 2 Fa= 4 : 0.200,<|>Tot Used: 14 , Added: 5 , Zero Std: 0 , Max Cor: 0.401

8 <R=0.401,thr=0.200>, Top: 2( 4 )1 : 2 Fa= 5 : 0.200,<|>Tot Used: 14 , Added: 5 , Zero Std: 0 , Max Cor: 0.405

9 <R=0.405,thr=0.200>, Top: 1( 4 )1 : 1 Fa= 5 : 0.200,<|>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.394

10 <R=0.394,thr=0.200>, Top: 1( 4 )1 : 1 Fa= 6 : 0.200,<|>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.305

11 <R=0.305,thr=0.200>, Top: 1( 3 )1 : 1 Fa= 7 : 0.200,<|>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.250

12 <R=0.250,thr=0.200>, Top: 3( 1 )1 : 3 Fa= 9 : 0.200,<|>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.193

13 <R=0.193,thr=0.200>

[ 13 ], 0.1931707 Decor Dimension: 14 Nused: 14 . Cor to Base: 13 , ABase: 1 , Outcome Base: 0

pander::pander(attr(body_fat_Decorrelated_train,"drivingFeatures"))

Weight


body_fat_Decorrelated_test <- predictDecorrelate(body_fat_Decorrelated_train
                                                 ,testingset)

## Outcome-driven transformation
body_fat_Decorrelated_trainD <- ILAA(trainingset,
                                     thr=0.2,
                                     Outcome="BodyFat",
                                     drivingFeatures="BodyFat",
                                     verbose=TRUE)

fast | LM |Abdomen Abdomen Weight Abdomen

Included: 14 , Uni p: 0.01071429 , Outcome-Driven Size: 1 , Base Size: 1 , Rcrit: 0.1676986

1 <R=0.940,thr=0.900>, Top: 2( 1 )1 : 2 Fa= 2 : 0.900,<|>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.880

2 <R=0.880,thr=0.750>, Top: 1( 2 )1 : 1 Fa= 2 : 0.750,<|>Tot Used: 6 , Added: 2 , Zero Std: 0 , Max Cor: 0.732

3 <R=0.732,thr=0.600>, Top: 2( 4 )1 : 2 Fa= 2 : 0.600,<|>Tot Used: 11 , Added: 6 , Zero Std: 0 , Max Cor: 0.870

4 <R=0.870,thr=0.750>, Top: 1( 1 )1 : 1 Fa= 3 : 0.750,<|>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.786

5 <R=0.786,thr=0.750>, Top: 1( 1 )1 : 1 Fa= 3 : 0.750,<|>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.785

6 <R=0.785,thr=0.750>, Top: 1( 1 )1 : 1 Fa= 3 : 0.750,<|>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.627

7 <R=0.627,thr=0.600>, Top: 1( 1 )1 : 1 Fa= 3 : 0.600,<|>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.571

8 <R=0.571,thr=0.450>, Top: 2( 6 )1 : 2 Fa= 3 : 0.450,<|>Tot Used: 14 , Added: 7 , Zero Std: 0 , Max Cor: 0.528

9 <R=0.528,thr=0.450>, Top: 3( 1 )1 : 3 Fa= 4 : 0.450,<|>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.411

10 <R=0.411,thr=0.200>, Top: 1( 2 )1 : 1 Fa= 4 : 0.200,<|>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.411

11 <R=0.411,thr=0.200>, Top: 1( 5 )1 : 1 Fa= 4 : 0.200,<|>Tot Used: 14 , Added: 5 , Zero Std: 0 , Max Cor: 0.398

12 <R=0.398,thr=0.200>, Top: 1( 4 )1 : 1 Fa= 5 : 0.200,<|>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.345

13 <R=0.345,thr=0.200>, Top: 2( 3 )1 : 2 Fa= 7 : 0.200,<|>Tot Used: 14 , Added: 5 , Zero Std: 0 , Max Cor: 0.316

14 <R=0.316,thr=0.200>, Top: 4( 1 )1 : 4 Fa= 9 : 0.200,<|>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.205

15 <R=0.205,thr=0.200>, Top: 1( 1 )1 : 1 Fa= 9 : 0.200,<|>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.193

16 <R=0.193,thr=0.200>

[ 16 ], 0.1930372 Decor Dimension: 14 Nused: 14 . Cor to Base: 13 , ABase: 1 , Outcome Base: 1


pander::pander(attr(body_fat_Decorrelated_trainD,"drivingFeatures"))

Abdomen


body_fat_Decorrelated_testD <- predictDecorrelate(body_fat_Decorrelated_trainD
                                                  ,testingset)

1.6.1 Train a Regression Model for Body Fat Prediction

Once we have a transformed training and testing set, we can proceed to train a linear model of the body fat content. For this example we will use the LASSO_1SE() function of the FRESA.CAD package to model the \(BodyFat\) using all the variables in the transformed training set.


## Outcome-Blind
modelBodyFat <- LASSO_1SE(BodyFat~.,body_fat_Decorrelated_train)
pander::pander(as.matrix(modelBodyFat$coef))
(Intercept) -19.5335
La_Age 0.1738
Weight 0.1609
La_Abdomen 0.7050
La_Thigh 0.0385
La_Wrist -1.5206
La_BMI 1.0894

## Outcome-Driven
modelBodyFatD <- LASSO_1SE(BodyFat~.,body_fat_Decorrelated_trainD)
pander::pander(as.matrix(modelBodyFatD$coef))
(Intercept) -37.0446
La_Weight -0.0557
Abdomen 0.5842

The last lines of code display the beta coefficients of the model.

1.6.2 The Model Coefficients in the Observed Space

The FRESA.CAD package provides a handy function, getObservedCoef()m to get the linear beta coefficients from the transformed object. The next code shows the procedure.


# Get the coefficients in the observed space for the outcome-blind
observedCoef <- getObservedCoef(body_fat_Decorrelated_train,modelBodyFat)
pander::pander(as.matrix(observedCoef$coefficients))
(Intercept) -19.5335
Age 0.0310
Weight -0.0358
Chest -0.2068
Abdomen 0.7050
Thigh -0.0884
Knee -0.0168
Biceps 0.3742
Wrist -0.7487
BMI 0.2649

# The outcome-driven coefficients
observedCoefD <- getObservedCoef(body_fat_Decorrelated_trainD,modelBodyFatD)
pander::pander(as.matrix(observedCoefD$coefficients))
(Intercept) -37.0446
Weight -0.0557
Abdomen 0.7149

1.6.3 Predict Using the Transformed Data-Set

The user can predict the BodyFat content using the handy predict() function. After that we can measure the testing performance using the predictionStats_regression() function.


## OUtcome-Blind 
predicBodyFat <- predict(modelBodyFat,body_fat_Decorrelated_test)
rmetrics <- predictionStats_regression(cbind(testingset$BodyFat,
                                             predicBodyFat),
                                       "Body Fat: Blind")

Body Fat: Blind

pander::pander(rmetrics)
  • corci:

    cor    
    0.832 0.736 0.895
  • biasci: -0.0546, -1.1682 and 1.0591

  • RMSEci: 4.42, 3.77 and 5.36

  • spearmanci:

    50% 2.5% 97.5%
    0.854 0.749 0.917
  • MAEci:

    50% 2.5% 97.5%
    3.5 2.89 4.18
  • pearson:

    Pearson’s product-moment correlation: predictions[, 1] and predictions[, 2]
    Test statistic df P value Alternative hypothesis cor
    11.7 61 3.07e-17 * * * two.sided 0.832

## Outcome-Driven
predicBodyFatD <- predict(modelBodyFatD,body_fat_Decorrelated_testD)
rmetrics <- predictionStats_regression(cbind(testingset$BodyFat,
                                             predicBodyFatD),
                                       "Body Fat: Driven")

Body Fat: Driven

pander::pander(rmetrics)
  • corci:

    cor    
    0.832 0.736 0.895
  • biasci: 0.125, -0.989 and 1.238

  • RMSEci: 4.42, 3.77 and 5.36

  • spearmanci:

    50% 2.5% 97.5%
    0.838 0.723 0.908
  • MAEci:

    50% 2.5% 97.5%
    3.46 2.81 4.19
  • pearson:

    Pearson’s product-moment correlation: predictions[, 1] and predictions[, 2]
    Test statistic df P value Alternative hypothesis cor
    11.7 61 3.12e-17 * * * two.sided 0.832

The reported metrics indicated that the model predictions are highly correlated to the real \(BodyFat\)

1.6.4 Prediction Using the Observed Features

An ILAA user has the option to predict the \(BodyFat\) content from the observed testing set using the computed beta coefficients. The next lines of code show how to do the prediction using model.matrix() R function and the dot product %*% :



predicBodyFatObst <- model.matrix(formula(observedCoef$formula),testingset) %*% observedCoef$coefficients

plot(predicBodyFatObst,
     predicBodyFat,
     xlab="Observed Space",
     ylab="Transformed Space",
     main="Test Predictions: Observed vs. Transformed")

The last plot shows the expected result: that both predictions are identical.

1.6.5 Comparison to Raw Model

A last experiment is to compare the differences between a LASSO model created from the observed features to the model created from the transformed observations.

The next lines of code compute the linear model using LASSO from the original observed data. Then, it computes the predicted performance.

rawmodelBodyFat <- LASSO_1SE(BodyFat~.,trainingset)
pander::pander(rawmodelBodyFat$coef)
(Intercept) Height Abdomen
-23.2 -0.189 0.601

rawpredicBodyFat <- predict(rawmodelBodyFat,testingset)
rmetrics <- predictionStats_regression(cbind(testingset$BodyFat,
                                             rawpredicBodyFat),"Body Fat")

Body Fat

pander::pander(rmetrics)
  • corci:

    cor    
    0.808 0.701 0.88
  • biasci: 0.169, -1.020 and 1.358

  • RMSEci: 4.72, 4.02 and 5.72

  • spearmanci:

    50% 2.5% 97.5%
    0.813 0.687 0.894
  • MAEci:

    50% 2.5% 97.5%
    3.66 3 4.43
  • pearson:

    Pearson’s product-moment correlation: predictions[, 1] and predictions[, 2]
    Test statistic df P value Alternative hypothesis cor
    10.7 61 1.13e-15 * * * two.sided 0.808

The evaluation of the testing results indicates that the observed model predictions have a correlation of 0.875. Slightly superior, but not statistically significant, to the one observed from the model estimated from the transformed space: ( \(\rho _t=0.863\) vs. \(\rho _o=0.875\) )

1.6.6 Comparing the Feature Significance on the Models

The main advantage of the ILAA transformation is that the returned latent variables are not colinear hence the statistical significance of the beta coefficients are not affected by multicolinearity. The next code snippet shows how to get the beta coefficients using the lm() , and summary.lm() functions.

The inspection of the summary results clearly shows that most of the beta coefficients on the transformed data set are significant.


## Raw Model
par(mfrow=c(2,2),cex=0.5)
rawlm <- lm(BodyFat~.,
            trainingset[,c("BodyFat",names(rawmodelBodyFat$coef)[-1])])
pander::pander(rawlm,add.significance.stars=TRUE)
Fitting linear model: BodyFat ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.733 9.2028 -0.514 6.08e-01
Height -0.580 0.1324 -4.384 1.95e-05 * * *
Abdomen 0.699 0.0331 21.117 3.61e-51 * * *
plot(rawlm)


## Outcome-Blind
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(BodyFat~.,body_fat_Decorrelated_train[,c("BodyFat",names(modelBodyFat$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
Fitting linear model: BodyFat ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -24.190 9.2576 -2.61 9.73e-03 * *
La_Age 0.236 0.0269 8.75 1.45e-15 * * *
Weight 0.184 0.0118 15.56 3.30e-35 * * *
La_Abdomen 0.903 0.1081 8.35 1.75e-14 * * *
La_Thigh 0.335 0.1293 2.59 1.04e-02 *
La_Wrist -2.586 0.5538 -4.67 5.88e-06 * * *
La_BMI 1.519 0.2248 6.76 1.84e-10 * * *
plot(Delm)


## Outcome-Driven
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(BodyFat~.,
           body_fat_Decorrelated_trainD[,c("BodyFat",names(modelBodyFatD$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
Fitting linear model: BodyFat ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -47.671 3.1314 -15.2 1.85e-34 * * *
La_Weight -0.124 0.0254 -4.9 2.12e-06 * * *
Abdomen 0.671 0.0321 20.9 1.41e-50 * * *
plot(Delm)


par(op)

1.7 Train a Logistic Model for Overweight Prediction

This last experiment showcases the effect of data transformation on logistic modeling. This experiment starts by creating a data-frame that does not includes the \(BMI\), \(Height\), and \(Weight\) variables. The target outcome is to identify if the person is Overweight or normal. (BMI>=25). The next lines of code compute the new data frames and remove the above mentioned variables.

1.7.1 Data Conditioning

First Remove Height and Weight from Training and Testing Sets


trainingsetBMI <- trainingset[,!(colnames(trainingset) %in% c("Weight","Height"))]
testingsetBMI <- testingset[,!(colnames(trainingset) %in% c("Weight","Height"))]
trainingsetBMI$Overweight <- 1*(trainingsetBMI$BMI>=25)
testingsetBMI$Overweight <- 1*(testingsetBMI$BMI>=25)
trainingsetBMI$BMI <- NULL
testingsetBMI$BMI <- NULL

# The number of subjects
pander::pander(table(trainingsetBMI$Overweight))
0 1
96 92
pander::pander(table(testingsetBMI$Overweight))
0 1
29 34

## The outcome-blind transformation
OW_Decorrelated_train <- ILAA(trainingsetBMI,
                              thr=0.2,
                              Outcome="Overweight",
                              verbose=TRUE)

fast | LM | Included: 12 , Uni p: 0.0125 , Outcome-Driven Size: 0 , Base Size: 1 , Rcrit: 0.1634602

1 <R=0.917,thr=0.900>, Top: 1( 1 )1 : 1 Fa= 1 : 0.900,<|>Tot Used: 2 , Added: 1 , Zero Std: 0 , Max Cor: 0.880

2 <R=0.880,thr=0.750>, Top: 1( 3 )1 : 1 Fa= 2 : 0.750,<|>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.721

3 <R=0.721,thr=0.600>, Top: 1( 5 )1 : 1 Fa= 2 : 0.600,<|>Tot Used: 10 , Added: 5 , Zero Std: 0 , Max Cor: 0.566

4 <R=0.566,thr=0.450>, Top: 3( 1 )1 : 3 Fa= 4 : 0.450,<|>Tot Used: 11 , Added: 4 , Zero Std: 0 , Max Cor: 0.408

5 <R=0.408,thr=0.200>, Top: 2( 5 )1 : 2 Fa= 5 : 0.200,<|>Tot Used: 12 , Added: 5 , Zero Std: 0 , Max Cor: 0.441

6 <R=0.441,thr=0.200>, Top: 2( 2 )1 : 2 Fa= 6 : 0.200,<|>Tot Used: 12 , Added: 3 , Zero Std: 0 , Max Cor: 0.461

7 <R=0.461,thr=0.450>, Top: 1( 1 )1 : 1 Fa= 6 : 0.450,<|>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.535

8 <R=0.535,thr=0.450>, Top: 1( 1 )1 : 1 Fa= 7 : 0.450,<|>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.408

9 <R=0.408,thr=0.200>, Top: 2( 4 )1 : 2 Fa= 7 : 0.200,<|>Tot Used: 12 , Added: 6 , Zero Std: 0 , Max Cor: 0.251

10 <R=0.251,thr=0.200>, Top: 2( 2 )1 : 2 Fa= 8 : 0.200,<|>Tot Used: 12 , Added: 3 , Zero Std: 0 , Max Cor: 0.203

11 <R=0.203,thr=0.200>, Top: 1( 1 )1 : 1 Fa= 8 : 0.200,<|>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.188

12 <R=0.188,thr=0.200>

[ 12 ], 0.1881405 Decor Dimension: 12 Nused: 12 . Cor to Base: 10 , ABase: 1 , Outcome Base: 0


OW_Decorrelated_test <- predictDecorrelate(OW_Decorrelated_train,testingsetBMI)

## The outcome-driven transformation

OW_Decorrelated_trainD <- ILAA(trainingsetBMI,
                               thr=0.2,
                               Outcome="Overweight",
                               drivingFeatures="Overweight",
                               verbose=TRUE)

fast | LM |Chest Chest Hip Chest

Included: 12 , Uni p: 0.0125 , Outcome-Driven Size: 1 , Base Size: 1 , Rcrit: 0.1634602

1 <R=0.917,thr=0.900>, Top: 1( 1 )1 : 1 Fa= 1 : 0.900,<|>Tot Used: 2 , Added: 1 , Zero Std: 0 , Max Cor: 0.880

2 <R=0.880,thr=0.750>, Top: 1( 2 )1 : 1 Fa= 1 : 0.750,<|>Tot Used: 4 , Added: 2 , Zero Std: 0 , Max Cor: 0.783

3 <R=0.783,thr=0.750>, Top: 1( 1 )1 : 1 Fa= 2 : 0.750,<|>Tot Used: 6 , Added: 1 , Zero Std: 0 , Max Cor: 0.743

4 <R=0.743,thr=0.600>, Top: 1( 5 )1 : 1 Fa= 2 : 0.600,<|>Tot Used: 10 , Added: 5 , Zero Std: 0 , Max Cor: 0.706

5 <R=0.706,thr=0.600>, Top: 2( 1 )1 : 2 Fa= 3 : 0.600,<|>Tot Used: 10 , Added: 2 , Zero Std: 0 , Max Cor: 0.497

6 <R=0.497,thr=0.450>, Top: 4( 1 )1 : 4 Fa= 5 : 0.450,<|>Tot Used: 12 , Added: 4 , Zero Std: 0 , Max Cor: 0.410

7 <R=0.410,thr=0.200>, Top: 2( 5 )1 : 2 Fa= 5 : 0.200,<|>Tot Used: 12 , Added: 5 , Zero Std: 0 , Max Cor: 0.398

8 <R=0.398,thr=0.200>, Top: 2( 1 )1 : 2 Fa= 5 : 0.200,<|>Tot Used: 12 , Added: 2 , Zero Std: 0 , Max Cor: 0.398

9 <R=0.398,thr=0.200>, Top: 1( 3 )1 : 1 Fa= 6 : 0.200,<|>Tot Used: 12 , Added: 3 , Zero Std: 0 , Max Cor: 0.354

10 <R=0.354,thr=0.200>, Top: 1( 3 )1 : 1 Fa= 7 : 0.200,<|>Tot Used: 12 , Added: 3 , Zero Std: 0 , Max Cor: 0.375

11 <R=0.375,thr=0.200>, Top: 2( 1 )1 : 2 Fa= 9 : 0.200,<|>Tot Used: 12 , Added: 2 , Zero Std: 0 , Max Cor: 0.217

12 <R=0.217,thr=0.200>, Top: 1( 1 )1 : 1 Fa= 9 : 0.200,<|>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.185

13 <R=0.185,thr=0.200>

[ 13 ], 0.1847617 Decor Dimension: 12 Nused: 12 . Cor to Base: 11 , ABase: 1 , Outcome Base: 1


OW_Decorrelated_testD <- predictDecorrelate(OW_Decorrelated_trainD,testingsetBMI)

The last code snippet transforms the observed features using ILLA and setting a target variable and setting the convergence not to be affected by the target outcome.

1.7.2 The Logistic Model

LASSO_1SE with a binomial family is used to compute the logistic model of overweight.


## Outcome-blind
modelOverweight <- LASSO_1SE(Overweight~.,
                             OW_Decorrelated_train,
                             family="binomial")
pander::pander(as.matrix(modelOverweight$coef))
(Intercept) -42.7392
La_BodyFat 0.1016
Age 0.0336
La_Neck 0.3030
La_Chest 0.2422
La_Abdomen 0.0397
Hip 0.4833
La_Ankle 0.0407
La_Wrist 0.0244

## Outcome-driven
modelOverweightD <- LASSO_1SE(Overweight~.,
                              OW_Decorrelated_trainD,
                              family="binomial")
pander::pander(as.matrix(modelOverweightD$coef))
(Intercept) -41.2180
La_BodyFat 0.0496
La_Age 0.0078
Chest 0.4354
La_Abdomen 0.0278
La_Thigh 0.0466

1.7.3 The Model Coefficients in the Observed Space

Once the logistic model is created in the transformed space, we can compute the beta coefficients for each one of the observed variables.


# Get the coefficients in the observed space
observedCoef <- getObservedCoef(OW_Decorrelated_train,modelOverweight)
pander::pander(as.matrix(observedCoef$coefficients))
(Intercept) -42.73919
BodyFat 0.04673
Age -0.00213
Neck -0.00785
Chest 0.22194
Abdomen 0.03972
Hip 0.15306
Ankle 0.03637
Wrist 0.02443

1.7.4 Predict Using the Transformed Data Set

The predictions of the testing set can be done using the handy predict() function. The evaluation of the testing results can be evaluated using the predictionStats_binary() function.


## Outcome-blind
predicOverweight <- predict(modelOverweight,OW_Decorrelated_test)
pr <- predictionStats_binary(cbind(OW_Decorrelated_test$Overweight,
                                   predicOverweight),"Overweight: Blind")

pander::pander(pr$ClassMetrics)
  • accci:

    50% 2.5% 97.5%
    0.841 0.746 0.921
  • senci:

    50% 2.5% 97.5%
    0.836 0.735 0.92
  • aucci:

    50% 2.5% 97.5%
    0.836 0.735 0.92
  • berci:

    50% 2.5% 97.5%
    0.164 0.0801 0.265
  • preci:

    50% 2.5% 97.5%
    0.85 0.75 0.927
  • F1ci:

    50% 2.5% 97.5%
    0.838 0.735 0.92

## Outcome-Driven
predicOverweightD <- predict(modelOverweightD,OW_Decorrelated_testD)
pr <- predictionStats_binary(cbind(OW_Decorrelated_test$Overweight,
                                   predicOverweightD),"Overweight: Driven")

pander::pander(pr$ClassMetrics)
  • accci:

    50% 2.5% 97.5%
    0.841 0.746 0.921
  • senci:

    50% 2.5% 97.5%
    0.843 0.75 0.927
  • aucci:

    50% 2.5% 97.5%
    0.843 0.75 0.927
  • berci:

    50% 2.5% 97.5%
    0.157 0.0732 0.25
  • preci:

    50% 2.5% 97.5%
    0.842 0.747 0.923
  • F1ci:

    50% 2.5% 97.5%
    0.841 0.744 0.921

1.7.5 Prediction Using the Observed Features

The predict of the testing set can be done using the model.matrix() and the dot product %*%.


predicOverweightObst <- model.matrix(formula(observedCoef$formula),testingsetBMI) %*% observedCoef$coefficients
#predicOverweightObst <- 1.0/(1.0 + exp(-predicOverweightObst));

plot(predicOverweightObst,predicOverweight,
     xlab="Observed",
     ylab="Transformed",
     main="Test predictions: Observed vs. Transformed")

The last plot shows the expected result: both predictions are identical.

1.7.6 Comparison to Raw Model

To showcase the advantage of transformed modeling vs. raw modeling, here I’ll estimate the logistic model from the observed variables and contrast to the model generated from the transformed space.

The next lines of code compute the logistic model and display its testing performance:

##Training
rawmodelOverweight <- LASSO_1SE(Overweight~.,
                                trainingsetBMI,
                                family="binomial")
pander::pander(rawmodelOverweight$coef)
(Intercept) BodyFat Chest Abdomen Thigh Ankle Biceps
-39.9 0.0108 0.206 0.147 0.0275 0.064 0.0818
## Predict
rawpredicOverweight <- predict(rawmodelOverweight,testingsetBMI)
pr <- predictionStats_binary(cbind(testingsetBMI$Overweight,
                                   rawpredicOverweight),"Overweight")

pander::pander(pr$ClassMetrics)
  • accci:

    50% 2.5% 97.5%
    0.873 0.778 0.952
  • senci:

    50% 2.5% 97.5%
    0.873 0.779 0.948
  • aucci:

    50% 2.5% 97.5%
    0.873 0.779 0.948
  • berci:

    50% 2.5% 97.5%
    0.127 0.052 0.221
  • preci:

    50% 2.5% 97.5%
    0.877 0.788 0.952
  • F1ci:

    50% 2.5% 97.5%
    0.872 0.777 0.95

The model created from the observed data has an ROC AUC that is not statistically significant to the transformed model

1.7.7 Comparing the Feature Significance on the Models

This last lines of code will compute the significance of the beta coefficients for both the observed model and the latent-based model. The user can clearly see that all the betas of the latent-based model are statically significant. An effect that is not seen in the logistic observed model.


par(mfrow=c(2,2),cex=0.5)

## Raw model
rawlm <- lm(Overweight~.,trainingsetBMI[,c("Overweight",names(rawmodelOverweight$coef)[-1])])
pander::pander(rawlm,add.significance.stars=TRUE)
Fitting linear model: Overweight ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.96182 0.40972 -9.669 4.30e-18 * * *
BodyFat 0.00588 0.00492 1.194 2.34e-01
Chest 0.01521 0.00781 1.947 5.31e-02
Abdomen 0.01486 0.00743 2.000 4.70e-02 *
Thigh -0.00120 0.00807 -0.149 8.82e-01
Ankle 0.02388 0.01761 1.356 1.77e-01
Biceps 0.03003 0.01229 2.444 1.55e-02 *
plot(rawlm)


## Outcome-blind
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(Overweight~.,OW_Decorrelated_test[,c("Overweight",names(modelOverweight$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
Fitting linear model: Overweight ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.00016 1.00747 -1.99 5.22e-02
La_BodyFat 0.02170 0.00865 2.51 1.52e-02 *
Age -0.00724 0.00474 -1.53 1.32e-01
La_Neck 0.13876 0.03292 4.22 9.55e-05 * * *
La_Chest 0.02887 0.01386 2.08 4.19e-02 *
La_Abdomen 0.02136 0.01841 1.16 2.51e-01
Hip 0.03482 0.00664 5.24 2.72e-06 * * *
La_Ankle -0.04120 0.03093 -1.33 1.88e-01
La_Wrist 0.09286 0.08511 1.09 2.80e-01
plot(Delm)



## Outcome-Driven
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(Overweight~.,OW_Decorrelated_testD[,c("Overweight",names(modelOverweightD$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
Fitting linear model: Overweight ~ .
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.991061 0.83416 -3.5857 6.98e-04 * * *
La_BodyFat 0.011972 0.00946 1.2656 2.11e-01
La_Age 0.000371 0.00536 0.0693 9.45e-01
Chest 0.038195 0.00613 6.2281 6.07e-08 * * *
La_Abdomen -0.008565 0.01614 -0.5306 5.98e-01
La_Thigh 0.023977 0.01441 1.6642 1.02e-01
plot(Delm)

1.8 Conclusion

In conclusion, ILAA (Iterative Linear Association Analysis), stands as an unsupervised computer-based methodology adept at estimating linear transformation matrices. These matrices enable the conversion of datasets into a fresh latent-based space, offering a user-controlled degree of correlation. This report has effectively demonstrated the practical application of ILAA, providing comprehensive insights into its functions for estimating, predicting, and scrutinizing transformations. Such capabilities hold significant promise in supervised learning scenarios, encompassing regression and logistic models.